The data needed for this exercise is taken from a similar course, in particular it comes from this publication. We will work with the output from the differential expression analysis of KO vs WT. The data files should be in a data/ subdirectory in your R working directory. You can download the data/ directory with files here.
Make a new directory for this exercise and add the data/ directory to it. Start R and use setwd() to change the working directory to the one you created.
We will use the following R packages in this exercise:
library(knitr)
library(topGO)
library(biomaRt)
library(piano)
library(NMF)
If you are not able to load any of these packages, try to install them, either using biocLite() (first you need to run source("http://www.bioconductor.org/biocLite.R")) or install.packages(). If something fails, try to understand the error message and fix it. If you get stuck, ask for help :)
diffExpRes <- read.delim("data/results_DE.txt", stringsAsFactors=F)
head(diffExpRes[,c(1,3,5,8,9)]) # skip some columns
## ensembl_gene_id mgi_symbol logFC PValue FDR
## 1 ENSMUSG00000034810 Scn7a -2.860822 4.613775e-45 6.456978e-41
## 2 ENSMUSG00000024799 Tm7sf2 -2.166921 9.630122e-44 6.738678e-40
## 3 ENSMUSG00000027200 Sema6d -1.836609 5.145875e-42 2.400551e-38
## 4 ENSMUSG00000022483 Col2a1 1.880766 2.420791e-41 7.273966e-38
## 5 ENSMUSG00000022231 Sema5a -3.039413 2.598773e-41 7.273966e-38
## 6 ENSMUSG00000023832 Acat2 -1.934613 8.941421e-41 2.085587e-37
Question 1: How many genes are significant at a cutoff of FDR<0.001?
We will start by performing overrepresentation analysis (a.k.a. list enrichment analysis, …) by using Enrichr and DAVID. Both use a scoring method similar to the Hypergeomtric/Fisher’s exact test. To do such a test, we need to have a list of interesting (e.g. differentially expressed) genes and a list of the background (also known as universe). However, both Enrichr and DAVID have their own background lists, so we do not need to specify them explicitly. First, let’s make a list of interesting genes!
write.table function to print the top significant genes to copy paste into some web browser tools:sel <- diffExpRes$ensembl_gene_id[diffExpRes$FDR<0.001]
write.table(sel, quote=F, row.names=F, col.names=F)
Question 2: What is the alternative to the Fisher Exact P-Value used by DAVID?
Question 3: Where all gene IDs recognized?
Question 4: What seems to be the main functions of the top significant genes? Does this make sense considering the experimental design?
sel object.Question 5: Where all gene IDs recognized?
Question 6: Does this list give the same results as using the Ensembl IDs?
As an alternative to DAVID we can try Enrichr. This page takes human gene symbols as input, so first we need to translate our mouse Ensembl IDs to that. This can be done in many ways, one way is to use the biomaRt package (find a user guide here or type vignette("biomaRt")):
# Use biomaRt to translate the mouse genes into human orthologs:
# Select the Ensembl BioMart database
ensembl <- useMart("ENSEMBL_MART_ENSEMBL")
# Select the mouse dataset and update the Mart object:
ensembl <- useDataset("mmusculus_gene_ensembl", mart=ensembl)
# Make the query:
bm <- getBM(attributes=c("ensembl_gene_id","hsapiens_homolog_ensembl_gene"), # this is what we want to extract
filters="ensembl_gene_id", # this determines the filter
values=sel, # this are the values to filter for (get only the genes in our list)
mart=ensembl)
head(bm)
## ensembl_gene_id hsapiens_homolog_ensembl_gene
## 1 ENSMUSG00000000093 ENSG00000121068
## 2 ENSMUSG00000000120 ENSG00000064300
## 3 ENSMUSG00000000184 ENSG00000118971
## 4 ENSMUSG00000000223 ENSG00000102385
## 5 ENSMUSG00000000253 ENSG00000137198
## 6 ENSMUSG00000000416 ENSG00000077063
As you can see, the above code gave us a map between mouse and human ensembl gene IDs. But this is not exactly what we wanted, we need human gene symbols.
attributes argument to the proper setting.Use the function listAttributes(ensembl) to see all possible options. Hint: if there are too many options to go through you can use grep to pull out the relevant ones, e.g.:
tmp <- listAttributes(ensembl)
tmp[grep("Human",tmp[,2]),]
If you are working in RStudio you can also use the convenient command View and then search for human, e.g.:
View(listAttributes(ensembl))
Try it also for the bm object. For future reference, note that you can also use listDatasets(ensembl) to see which datasetes you can choose from, try it!
Question 7: How many of the genes in our selected list have gene symbols?
duplicated and unique functions:# A vector to test stuff on:
tmp <- c(1,2,2,3,4,4,4,5)
# Get unique elements:
unique(tmp)
## [1] 1 2 3 4 5
# Which elements are duplicated?
duplicated(tmp)
## [1] FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE
# How many duplicates are there:
sum(duplicated(tmp))
## [1] 3
# Which are the duplicated elements:
tmp[duplicated(tmp)]
## [1] 2 4 4
# Which are the unique duplicated elements:
unique(tmp[duplicated(tmp)])
## [1] 2 4
Make sure you understand how these two functions work and what they do, then move ahead with the following steps and questions:
Question 8: Are there duplicates Ensembl IDs or gene symbols in the list for Enrichr (
bm)? How can this affect the results that we get?
getBM again, this time also adding information about % identity between mouse and human genes.?boxplot if you are unsure about this.Question 9: Looking at the boxplot, are there any reasons to be concerned?
Question 10: Which gene in the list has the lowest % identity?
Question 11: Are the results similar to those we got from DAVID?
Question 12: What do the edges (links) in the network denote?
Question 13: We have this far used a cutoff of FDR<0.001. Try a few different cutoffs and rerun DAVID and Enrichr. How different are the results?
Question 14: For the Enrichr and DAVID results so far, can we know whether the identified functions are active/inactive or up/down-regulated in KO vs control? If not, how can we go about to get a clue about that?
Above, we have looked at some different types of gene-set collections, but a lot of focus has been on GO-terms.
Question 15: What are the three main domains/ontologies?
Question 16: By whom and how are GO-terms maintained/defined/updated?
Question 17: What is the evidence code?
Question 18: How are GO-terms connected to each other?
We can use e.g. the topGO package to visualize gene-set analysis results on the GO hierarchy network (topGO manual).
# Format for topgo - remove genes without ensembl id:
d_topgo <- diffExpRes[!is.na(diffExpRes$ensembl_gene_id),]
rownames(d_topgo) <- d_topgo$ensembl_gene_id
# Create the topGOdata object:
GOobj <- new("topGOdata",
description = "Simple session", # just a name, can be changed
ontology = "BP", # set to BP, MF, or CC
allGenes = setNames(diffExpRes$FDR, diffExpRes$ensembl_gene_id), # named numeric vector
geneSel = function(x) x<0.01, # a function to select genes from the allGenes vector
nodeSize = 10, # remove GO-terms with less than this number of genes
mapping = "org.Mm.eg.db", # annotation database
ID = "ensembl", # gene ID used as names in allGenes vector
annot = annFUN.org)
# Run GO-term enrichment analysis:
resultFisher <- runTest(GOobj, algorithm = "classic", statistic = "fisher")
# Plot results using the GO hierarchical DAG:
showSigOfNodes(GOobj, score(resultFisher), firstSigNode=15, useInfo ='all')
If you save the plot as a PDF it will be easier to zoom in and read the node text. (Sometimes the plotting misbehaves in RStudio, try running grid.newpage(), plot.new(), and/or par(mfcol=c(1,1)) if plotting looks weird.)
Question 19: What are the circles and squares denoting?
Question 20: Which is the most significant BP GO-term?
Question 21: Which is the most significant CC GO-term?
Piano is an R-package for carrying out gene-set analysis (see more info here).
gscGO <- loadGSC("data/GSC/c5.bp.v5.2.symbols.gmt")
gscGO
## First 50 (out of 4653) gene set names:
## [1] "GO_REGULATION_O..." "GO_LACTATE_TRAN..." "GO_POSITIVE_REG..."
## [4] "GO_DETECTION_OF..." "GO_CARDIAC_CHAM..." "GO_CALCIUM_ION_..."
## [7] "GO_DNA_DEPENDEN..." "GO_TRNA_AMINOAC..." "GO_CIRCADIAN_RH..."
## [10] "GO_PHOSPHATIDYL..." "GO_SPINAL_CORD_..." "GO_REGULATION_O..."
## [13] "GO_PLATELET_DER..." "GO_GLUCURONATE_..." "GO_CELLULAR_RES..."
## [16] "GO_REGULATION_O..." "GO_POSITIVE_REG..." "GO_CEREBRAL_COR..."
## [19] "GO_POSITIVE_REG..." "GO_NEGATIVE_REG..." "GO_POTASSIUM_IO..."
## [22] "GO_L_PHENYLALAN..." "GO_REGULATION_O..." "GO_CARDIAC_MUSC..."
## [25] "GO_NEGATIVE_REG..." "GO_MOVEMENT_IN_..." "GO_REGULATION_O..."
## [28] "GO_APICAL_PROTE..." "GO_REGULATION_O..." "GO_FOREBRAIN_NE..."
## [31] "GO_POSITIVE_REG..." "GO_NEUROMUSCULA..." "GO_MITOTIC_CYTO..."
## [34] "GO_NEGATIVE_REG..." "GO_SMAD_PROTEIN..." "GO_CYTOPLASMIC_..."
## [37] "GO_MEIOTIC_CHRO..." "GO_POSITIVE_REG..." "GO_REGULATION_O..."
## [40] "GO_RNA_DEPENDEN..." "GO_REGULATION_O..." "GO_REGULATION_O..."
## [43] "GO_DENDRITE_DEV..." "GO_REGULATION_O..." "GO_G_PROTEIN_CO..."
## [46] "GO_MEMORY" "GO_NEURON_DEVEL..." "GO_REGULATION_O..."
## [49] "GO_ENDOTHELIAL_..." "GO_POSITIVE_REG..."
##
## First 50 (out of 15578) gene names:
## [1] "PDE1B" "NR4A2" "MAOB" "DRD4" "TACR3" "PARK2"
## [7] "COMT" "HTR1A" "GPR37" "HPRT1" "CHRNB2" "SNCA"
## [13] "SLC6A3" "ABAT" "DRD1" "PNKD" "PARK7" "SLC16A8"
## [19] "SLC16A5" "SLC16A4" "SLC5A12" "EMB" "SLC16A1" "SLC16A7"
## [25] "SLC16A12" "SLC16A13" "SLC16A6" "SLC16A3" "SLC16A11" "POLR2C"
## [31] "POLR2J" "CTDP1" "RDBP" "COBRA1" "RSF1" "POLR2B"
## [37] "POLR2D" "POLR2G" "POLR2F" "POLR2A" "SNW1" "CHD1"
## [43] "POLR2K" "CDK9" "JUN" "POLR2E" "CCNT1" "LEF1"
## [49] "POLR2H" "GTF2F2"
##
## Gene set size summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 17.0 34.0 111.5 92.0 1984.0
##
## No additional info available.
These are annotated with human gene symbols.
ensembl <- useMart("ENSEMBL_MART_ENSEMBL")
ensembl <- useDataset("mmusculus_gene_ensembl", mart=ensembl)
bm <- getBM(attributes=c("ensembl_gene_id","hsapiens_homolog_associated_gene_name"),
filters="ensembl_gene_id",
values=diffExpRes$ensembl_gene_id, # note that now we use all genes here!
mart=ensembl)
bm <- bm[bm[,2]%in%unique(unlist(gscGO)),] # filter for genes in the GO gene-set collection
head(bm)
## ensembl_gene_id hsapiens_homolog_associated_gene_name
## 1 ENSMUSG00000000001 GNAI3
## 2 ENSMUSG00000000028 CDC45
## 6 ENSMUSG00000000078 KLF6
## 7 ENSMUSG00000000085 SCMH1
## 8 ENSMUSG00000000088 COX5A
## 9 ENSMUSG00000000093 TBX2
Question 22: Are there any duplicates in
bm? How many? What does this mean?
bm:bm <- bm[!duplicated(bm[,1]),]
Note that multiple ENSMUS IDs still can map to the same gene symbol!
# merge diffExpRes and bm:
d_piano <- merge(diffExpRes, bm, by="ensembl_gene_id", all.x=T, sort=F)
# get geneNames, pvals and logfc:
geneNames <- d_piano$hsapiens_homolog_associated_gene_name
pvals <- d_piano$FDR
names(pvals) <- geneNames
logfc <- d_piano$logFC
names(logfc) <- geneNames
Question 23: How many unique mouse Ensembl IDs could we map to human orthologs? How many unique human orhtologs do we map to?
First, let’s use piano to perform a hypergeometric test, similar to what we did with Enrichr, DAVID, and topGO.
# Get the gene-list:
sel <- names(pvals)[pvals<0.001]
sel <- unique(sel[!is.na(sel)])
# Run the analysis:
res <- runGSAhyper(genes=sel, universe=unique(names(pvals)), gsc=gscGO, gsSizeLim=c(20, 200))
# Visualize results:
networkPlot(res, class="non", adjusted=T, significance=0.0001, ncharLabel=Inf,
nodeSize=c(3,20), edgeWidth=c(1,15), cexLabel=0.7, overlap=20,
scoreColors=c("greenyellow","darkgreen"))
Here, we use the
networkPlot function which draws a network for the most significant gene-sets.
Question 24: What do the colors, node-sizes, and edges denote?
Question 25: Are the results in line with those from topGO, DAVID, and Enrichr?
Next, we will use piano to perform gene-set analysis, i.e. not the enrichment of a gene-list based on a cut-off, but using all available gene-level statistics. We will use signed (to keep track of fold-change) -log10-pvalues to rank the gene-sets, i.e. -log10(pvals)*sign(logfc).
gsaRes <- runGSA(-log10(pvals)*sign(logfc), geneSetStat="fgsea", gsc=gscGO, gsSizeLim=c(20, 200))
networkPlot(gsaRes, "distinct", "both", adjusted=T, significance=0.05, ncharLabel=Inf,
nodeSize=c(3,15), edgeWidth=c(1,8), cexLabel=0.7, overlap=20,
scoreColors=c("red", "orange", "yellow", "blue", "lightblue", "lightgreen"))
Question 26: Could gene overlap between you gene-sets bias the result interpretation in any way?
Question 27: What is the top GO-term affected by down-regulation and up-regulation, respectively?
data/GSC/c2.cp.v5.2.symbols.gmt, which contains pathway gene-sets from MSigDB, as a gene-set collection (load it with loadGSC()).geneSetStat="mean") instead of fgsea.For example it could look something similar to this:
Question 28: Are these results in line with previous results we have seen in this exercise? Do you feel there is a consensus pattern?
It is always good to go back to the gene-level while looking at your final GSA results. Here we will use a heatmap for visualizing gene-level data for selected gene-sets (using one of many heatmap functions available for R).
# Get genes for a specific gene-set:
selGenes <- names(geneSetSummary(gsaRes, "GO_STEROID_BIOSYNTHETIC_PROCESS")$geneLevelStats)
# Merge with the diffexp result table:
selTab <- merge(cbind(selGenes), d_piano[,c(1,3,5,9,10)],
by.x=1, by.y="hsapiens_homolog_associated_gene_name", all.x=T)
selTab$log10FDR <- -log10(selTab$FDR)*sign(selTab$logFC)
# Display this table:
kable(selTab)
| selGenes | ensembl_gene_id | mgi_symbol | logFC | FDR | log10FDR |
|---|---|---|---|---|---|
| ACAA2 | ENSMUSG00000036880 | Acaa2 | 0.3227222 | 0.0729742 | 1.1368305 |
| ACBD3 | ENSMUSG00000026499 | Acbd3 | -0.2117000 | 0.8284461 | -0.0817357 |
| ACLY | ENSMUSG00000020917 | Acly | -1.0100942 | 0.0000000 | -14.8512572 |
| ACOT8 | ENSMUSG00000017307 | Acot8 | 0.1464229 | 0.8183638 | 0.0870536 |
| AKR1B15 | ENSMUSG00000061758 | Akr1b10 | 0.4710835 | 0.2556798 | 0.5923037 |
| AKR1B15 | ENSMUSG00000029762 | Akr1b8 | 0.4084569 | 0.2292405 | 0.6397086 |
| AKR1B15 | ENSMUSG00000061758 | Akr1b10 | 0.4710835 | 0.2556798 | 0.5923037 |
| AKR1B15 | ENSMUSG00000029762 | Akr1b8 | 0.4084569 | 0.2292405 | 0.6397086 |
| AKR1C3 | ENSMUSG00000021213 | Akr1c13 | 0.1587939 | 0.9247777 | 0.0339627 |
| AKR1C4 | ENSMUSG00000021213 | Akr1c13 | 0.1587939 | 0.9247777 | 0.0339627 |
| AMACR | ENSMUSG00000022244 | Amacr | 0.1090078 | 0.8615007 | 0.0647444 |
| ARV1 | ENSMUSG00000031982 | Arv1 | -0.0702522 | 0.9464355 | -0.0239090 |
| C14orf1 | ENSMUSG00000021252 | 0610007P14Rik | -0.9167006 | 0.0000000 | -9.7075763 |
| CACNA1H | ENSMUSG00000024112 | Cacna1h | 0.6248563 | 0.0628600 | 1.2016254 |
| CNBP | ENSMUSG00000030057 | Cnbp | 0.0848649 | 0.8070392 | 0.0931054 |
| CYB5R1 | ENSMUSG00000026456 | Cyb5r1 | 0.4548193 | 0.1030950 | 0.9867626 |
| CYB5R3 | ENSMUSG00000018042 | Cyb5r3 | -0.2805545 | 0.1706601 | -0.7678679 |
| CYP11A1 | ENSMUSG00000032323 | Cyp11a1 | 0.6369639 | 0.5239671 | 0.2806960 |
| CYP27A1 | ENSMUSG00000026170 | Cyp27a1 | 0.1075853 | 0.9047557 | 0.0434687 |
| CYP2R1 | ENSMUSG00000030670 | Cyp2r1 | -0.2493062 | 0.8816242 | -0.0547165 |
| CYP39A1 | ENSMUSG00000023963 | Cyp39a1 | -1.0522696 | 0.0000000 | -9.8271986 |
| CYP3A4 | ENSMUSG00000029727 | Cyp3a13 | 0.0295973 | 0.9806181 | 0.0085001 |
| CYP46A1 | ENSMUSG00000021259 | Cyp46a1 | -0.4376909 | 0.6736854 | -0.1715429 |
| CYP51A1 | ENSMUSG00000001467 | Cyp51 | -1.6531937 | 0.0000000 | -20.8889340 |
| CYP7B1 | ENSMUSG00000039519 | Cyp7b1 | -0.0504172 | 0.8895779 | -0.0508160 |
| DHCR24 | ENSMUSG00000034926 | Dhcr24 | -1.7828603 | 0.0000269 | -4.5701737 |
| DHCR7 | ENSMUSG00000058454 | Dhcr7 | -0.8584124 | 0.0000000 | -8.8078473 |
| EBP | ENSMUSG00000031168 | Ebp | -0.5232713 | 0.0206938 | -1.6841595 |
| FDFT1 | ENSMUSG00000021273 | Fdft1 | -1.4720779 | 0.0000000 | -22.7374916 |
| FDPS | ENSMUSG00000059743 | Fdps | -2.0806552 | 0.0000232 | -4.6340487 |
| FDX1 | ENSMUSG00000032051 | Fdx1 | 0.3163207 | 0.4555077 | 0.3415043 |
| FDXR | ENSMUSG00000018861 | Fdxr | -0.2715724 | 0.4486059 | -0.3481350 |
| GGPS1 | ENSMUSG00000021302 | Ggps1 | -0.1880611 | 0.7162604 | -0.1449291 |
| HINT2 | ENSMUSG00000028470 | Hint2 | 0.1070335 | 0.8354226 | 0.0780938 |
| HMGCR | ENSMUSG00000021670 | Hmgcr | -1.9043148 | 0.0000000 | -22.9236949 |
| HMGCS1 | ENSMUSG00000093930 | Hmgcs1 | -2.1349687 | 0.0000000 | -21.3281530 |
| HMGCS2 | ENSMUSG00000027875 | Hmgcs2 | 0.4020805 | 0.0251885 | 1.5987971 |
| HSD11B2 | ENSMUSG00000031891 | Hsd11b2 | 1.0354925 | 0.1483534 | 0.8287025 |
| HSD17B11 | ENSMUSG00000029311 | Hsd17b11 | 0.2584276 | 0.2948881 | 0.5303427 |
| HSD17B12 | ENSMUSG00000027195 | Hsd17b12 | -0.4645855 | 0.0014895 | -2.8269590 |
| HSD17B14 | ENSMUSG00000030825 | Hsd17b14 | -0.4382260 | 0.7236379 | -0.1404787 |
| HSD17B4 | ENSMUSG00000024507 | Hsd17b4 | -0.0393758 | 0.9229765 | -0.0348093 |
| HSD17B7 | ENSMUSG00000026675 | Hsd17b7 | -1.4288825 | 0.0000000 | -9.6138912 |
| HSD3B7 | ENSMUSG00000042289 | Hsd3b7 | 0.0582406 | 0.9228457 | 0.0348709 |
| IDI1 | ENSMUSG00000058258 | Idi1 | -1.8497690 | 0.0000000 | -7.7800272 |
| INSIG1 | ENSMUSG00000045294 | Insig1 | -1.4724813 | 0.0000000 | -20.1443378 |
| INSIG2 | ENSMUSG00000003721 | Insig2 | -0.0976972 | 0.8562685 | -0.0673900 |
| LBR | ENSMUSG00000004880 | Lbr | -0.2599271 | 0.3926051 | -0.4060441 |
| LSS | ENSMUSG00000033105 | Lss | -2.3815805 | 0.0000018 | -5.7332717 |
| MED1 | ENSMUSG00000018160 | Med1 | 0.1449697 | 0.5948939 | 0.2255605 |
| MSMO1 | ENSMUSG00000031604 | Msmo1 | -1.9228029 | 0.0000000 | -30.1111731 |
| MVK | ENSMUSG00000041939 | Mvk | -1.0533312 | 0.0000000 | -12.1796821 |
| NSDHL | ENSMUSG00000031349 | Nsdhl | -1.3307947 | 0.0000000 | -14.8501597 |
| PBX1 | ENSMUSG00000052534 | Pbx1 | 0.0237174 | 0.9547740 | 0.0200994 |
| PMVK | ENSMUSG00000027952 | Pmvk | -1.0456293 | 0.0000000 | -10.2062817 |
| PRKAA1 | ENSMUSG00000050697 | Prkaa1 | 0.7674139 | 0.3298035 | 0.4817447 |
| PRKAA2 | ENSMUSG00000028518 | Prkaa2 | 1.0590779 | 0.0226386 | 1.6451495 |
| PRKAG2 | ENSMUSG00000028944 | Prkag2 | -0.5878811 | 0.0552485 | -1.2576795 |
| SCARB1 | ENSMUSG00000037936 | Scarb1 | -0.3765647 | 0.0507384 | -1.2946633 |
| SCP2 | ENSMUSG00000028603 | Scp2 | -0.2126157 | 0.8743149 | -0.0583321 |
| SDR42E1 | ENSMUSG00000034308 | Sdr42e1 | 0.3786522 | 0.6575808 | 0.1820508 |
| SQLE | ENSMUSG00000022351 | Sqle | -1.2502197 | 0.0000000 | -14.1391715 |
| SRD5A3 | ENSMUSG00000029233 | Srd5a3 | 0.0355565 | 0.9386586 | 0.0274923 |
| STAR | ENSMUSG00000031574 | Star | -0.1712996 | 0.8121067 | -0.0903869 |
| STARD3 | ENSMUSG00000018167 | Stard3 | -0.2927755 | 0.1854065 | -0.7318751 |
| STARD5 | ENSMUSG00000046027 | Stard5 | 0.4838561 | 0.0045057 | 2.3462366 |
| TM7SF2 | ENSMUSG00000024799 | Tm7sf2 | -2.1669206 | 0.0000000 | -39.1714253 |
| TSPO | ENSMUSG00000041736 | Tspo | 0.0877881 | 0.8059029 | 0.0937173 |
| WNT4 | ENSMUSG00000036856 | Wnt4 | 1.8713034 | 0.0014986 | 2.8243258 |
Question 29: Does the table look like you expect it to?
# Read in the count data:
tableCounts <- read.table(file="data/tableCounts_with_location.tab", sep="\t", header=TRUE, skip=1)
rownames(tableCounts) <- tableCounts[,1]
tableCounts <- tableCounts[,7:12]
colnames(tableCounts) <- c("KO_1","KO_2", "KO_3", "WT_1", "WT_2", "WT_3");
# Merge count data with the selTab:
selTab <- merge(selTab, tableCounts, by.x="ensembl_gene_id", by.y=0, all.x=T)
# Plot a heatmap:
# Row annotation:
rowann <- list(Significant=ifelse(selTab$FDR<0.05,"yes","no"), Regulation=ifelse(selTab$logFC>0,"Up","Down"))
rowannCol <- list(Significant=c("black","green"), Regulation=c("blue","red"))
# Row labels:
rowlabs <- selTab$mgi_symbol
rowlabs[duplicated(rowlabs)] <- paste(rowlabs[duplicated(rowlabs)]," ")
# Plot:
aheatmap(selTab[,7:12], scale="row", annRow=rowann, annColors=rowannCol,
labRow=rowlabs,
txt=selTab[,7:12])
Note that here we scale the rows so the colors will be visualizing the difference for each gene specifically, but is not comparable across genes. Therefore we also include the actual counts in each heatmap cell.
Question 30: Does it look “better”, what is the drawback/benefit?
Question 31: Given the count data for these genes, does the differential expression results make sense, as indicated by the column annotation columns? Any specific genes that you are concerned about?
Question 32: Would you say from looking at the heatmap, that in general steroid biosynthesis is down-regulated or up-regulated? Does your answer reflect the GSA results as seen in the network plot above?
Question 33: Are these genes actually involved in steroid biosynthesis? Are you confident enough from the data and analysis results to say anything about how YAP/TAZ knockout affects this biological process?
pull out and use GO evidence code? fixa toc float? kolla subset av frågor som ska lämnas in? Lämna in kod?
This page was generated using the following R session:
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.6 (El Capitan)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] RColorBrewer_1.1-2 Rgraphviz_2.18.0 org.Mm.eg.db_3.4.0
## [4] NMF_0.20.6 cluster_2.0.5 rngtools_1.2.4
## [7] pkgmaker_0.22 registry_0.3 piano_1.14.0
## [10] biomaRt_2.30.0 topGO_2.26.0 SparseM_1.72
## [13] GO.db_3.4.0 AnnotationDbi_1.36.0 IRanges_2.8.0
## [16] S4Vectors_0.12.0 Biobase_2.34.0 graph_1.52.0
## [19] BiocGenerics_0.20.0 knitr_1.14
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.7 lattice_0.20-34 relations_0.6-6
## [4] gtools_3.5.0 assertthat_0.1 digest_0.6.10
## [7] foreach_1.4.3 gridBase_0.4-7 slam_0.1-37
## [10] plyr_1.8.4 chron_2.3-47 RSQLite_1.0.0
## [13] evaluate_0.10 highr_0.6 ggplot2_2.1.0
## [16] gplots_3.0.1 data.table_1.9.6 gdata_2.17.0
## [19] rmarkdown_1.1 sets_1.0-16 BiocParallel_1.8.0
## [22] stringr_1.1.0 igraph_1.0.1 RCurl_1.95-4.8
## [25] munsell_0.4.3 fgsea_1.0.0 marray_1.52.0
## [28] htmltools_0.3.5 tibble_1.2 gridExtra_2.2.1
## [31] codetools_0.2-15 matrixStats_0.51.0 XML_3.98-1.4
## [34] bitops_1.0-6 xtable_1.8-2 gtable_0.2.0
## [37] DBI_0.5-1 magrittr_1.5 formatR_1.4
## [40] scales_0.4.0 KernSmooth_2.23-15 stringi_1.1.2
## [43] reshape2_1.4.1 doParallel_1.0.10 limma_3.30.0
## [46] fastmatch_1.0-4 iterators_1.0.8 tools_3.3.1
## [49] yaml_2.1.13 colorspace_1.2-7 caTools_1.17.1